Evaluating Stemmers and Retrieval Fusion Approaches for Hindi: UNT at FIRE 2010

نویسندگان

  • Miguel E. Ruiz
  • Bharath Dandala
چکیده

This paper describes the experiments conducted by the University of North Texas team as part of our participation in the Forum for Information Retrieval (FIRE). We concentrated on comparing the results using two morphological stemmers (YASS and Morfessor), studying the effect of using a part of speech tagger (Combined Random Fields) to weight the contribution of words with noun phrases, and to use a data fusion approach to improve performance of the system by combining these methods. We conducted our study using Hindi and explore the cross-language retrieval performance from English to Hindi using Google translations. Our results show that using the YASS stemmer yields a small increase in retrieval performance. Fusion of results also showed to be effective and improved results 5% in our experiments.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DCU@FIRE-2012: Rule-based Stemmers for Bengali and Hindi

For the participation of Dublin City University (DCU) in the FIRE-2012 Morpheme Extraction Task (MET), we investigated a rule based stemming approaches for Bengali and Hindi IR. The MET task itself is an attempt to obtain a fair and direct comparison between various stemming approaches measured by comparing the retrieval effectiveness obtained by each on the same dataset. Linguistic knowledge w...

متن کامل

FIRE-2008 at Maryland: English-Hindi CLIR

In this year's Forum for Information Retrieval Evaluation (FIRE), the University of Maryland participated in the Ad-hoc task cross-language document retrieval task, with English queries and Hindi documents. The experiments focused on evaluating the effectiveness of a “meaning matching” approach based on translation probabilities. The FIRE Hindi test collection provides the first opportunity to ...

متن کامل

Developing Morphological Analysers for South Asian Languages: Experimenting with the Hindi and Gujarati Languages

A considerable amount of work has been put into development of stemmers and morphological analysers. The majority of these approaches use hand-crafted suffix-replacement rules but a few try to discover such rules from corpora. While most of the approaches remove or replace suffixes, there are examples of derivational stemmers which are based on prefixes as well. In this paper we present a rule-...

متن کامل

DCU@FIRE2012: Monolingual and Crosslingual SMS-based FAQ Retrieval

This paper presents results for DCU’s second participation in the SMS-based FAQ Retrieval task at FIRE. For FIRE 2012, we submitted runs for the monolingual English and Hindi and the crosslingual English to Hindi subtasks. Compared to our experiments for FIRE 2011, our system was simplified by using a single retrieval engine (instead of three) and using a single approach for detecting out-of-do...

متن کامل

DCU@FIRE2010: Term Conflation, Blind Relevance Feedback, and Cross-Language IR with Manual and Automatic Query Translation

For the first participation of Dublin City University (DCU) in the FIRE 2010 evaluation campaign, information retrieval (IR) experiments on English, Bengali, Hindi, and Marathi documents were performed to investigate term conflation (different stemming approaches and indexing word prefixes), blind relevance feedback, and manual and automatic query translation. The experiments are based on BM25 ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010